Skip to content

Bump skill turn caps (review 30→200, gen 500→1000)#789

Merged
rdimitrov merged 1 commit intomainfrom
bump-skill-turn-caps
Apr 22, 2026
Merged

Bump skill turn caps (review 30→200, gen 500→1000)#789
rdimitrov merged 1 commit intomainfrom
bump-skill-turn-caps

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

Context

PR #788 (Update stacklok/toolhive to v0.24.0) failed the augmentation run when `skill_review` hit the 30-turn cap mid-edit.

From run 24786157448:

Session Turns Cost (USD) Outcome
`skill_gen` 67 $4.72 ✅ Success (under 500 cap), committed `42267da Document local vMCP CLI mode`
`skill_review` 31 $1.86 ❌ Exit `error_max_turns` at turn 31 / 30

Review's failure cascaded the workflow to `failure`, skipping autofix and PR body augmentation — so PR #788 is stuck with an incomplete rendered body and no run-cost table, even though the skill's content work actually landed.

Why the cap was too tight

My original review cap of 30 was sized against silent / no-changes release runs (baseline 4-6 turns). For a real multi-file content release like v0.24.0, the editorial pass genuinely walks each edited file and makes tightening edits — ~30-100 turns is legitimate working range.

Changes

Session Old New Headroom rationale
`skill_review` 30 200 2x-6x over observed 30-100 working range
`skill_gen` 500 1000 Defensive doubling; 500 never hit in production, but 1000 keeps us well clear of the 397-turn v3-test anomaly

Caps still clip genuine runaways loudly; we raise them deliberately if a release ever genuinely needs more.

Follow-up for PR #788

Once this merges, PR #788 needs a retry via `gh workflow run upstream-release-docs.yml -f pr_number=788` to complete the augmentation step. The existing skill content commit (`42267da`) stays; retry re-runs everything but should now finish review + autofix + body augmentation cleanly.

PR #788 (Update stacklok/toolhive to v0.24.0) failed the run
when skill_review hit the 30-turn cap mid-edit. Details from run
24786157448:

- skill_gen: 67 turns / $4.72 (under 500 cap) -- succeeded,
  committed `42267da Document local vMCP CLI mode`
- skill_review: 31 turns / $1.86 -- exited with subtype
  `error_max_turns`, cascading workflow to `failure`, which
  skipped autofix and PR body augmentation

My initial cap of 30 for review was sized against silent/no-
changes releases (4-6 turns baseline). For a real multi-file
content review the editorial pass walks each edited file and
makes tightening edits; ~30-100 turns is legitimate working
range.

Changes:
  - skill_review: 30 -> 200 (2x-6x headroom over working range)
  - skill_gen:    500 -> 1000 (defensive doubling; 500 was
    never hit in production, but 1000 keeps us well clear of
    the 397-turn v3-test anomaly)

Hitting either cap still produces a loud failure; raise
deliberately if a release genuinely needs more.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 22, 2026 15:29
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs-website Ready Ready Preview, Comment Apr 22, 2026 3:30pm

Request Review

@rdimitrov rdimitrov merged commit 41520bc into main Apr 22, 2026
5 checks passed
@rdimitrov rdimitrov deleted the bump-skill-turn-caps branch April 22, 2026 15:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts the upstream-release-docs GitHub Actions workflow to reduce failures from hitting Claude Code session turn caps during larger, multi-file release doc updates (notably the review pass that previously capped at 30 turns).

Changes:

  • Increase skill_gen (generation) --max-turns from 500 to 1000.
  • Increase skill_review (editorial review) --max-turns from 30 to 200.
  • Update inline workflow comments documenting the rationale for the new caps.

Comment on lines +551 to +557
# --max-turns 1000: observed gen baselines are 20 turns
# (silent) to 152 (full content rebuild). 500 was the
# initial cap; bumped to 1000 for extra headroom on
# multi-feature releases and to stay well above the
# suspected-looping 397-turn v3-test run (still clips
# genuine runaways). Hitting the cap produces a loud
# failure -- raise deliberately if a release needs more.
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated rationale mixes two different “observed baseline” ranges (20–152) while also referencing a 397-turn run. This is internally inconsistent and makes it hard to interpret the headroom goal. Consider rephrasing to separate “typical/observed” from “anomalous” runs (or update the baseline range to include the 397 outlier if it’s being treated as legitimate).

Suggested change
# --max-turns 1000: observed gen baselines are 20 turns
# (silent) to 152 (full content rebuild). 500 was the
# initial cap; bumped to 1000 for extra headroom on
# multi-feature releases and to stay well above the
# suspected-looping 397-turn v3-test run (still clips
# genuine runaways). Hitting the cap produces a loud
# failure -- raise deliberately if a release needs more.
# --max-turns 1000: typical observed gen runs span 20
# turns (silent) to 152 (full content rebuild). 500 was
# the initial cap; bumped to 1000 for extra headroom on
# multi-feature releases. Separately, a suspected-looping
# v3-test run reached 397 turns, so 1000 stays well above
# that anomalous case while still clipping genuine
# runaways. Hitting the cap produces a loud failure --
# raise deliberately if a release needs more.

Copilot uses AI. Check for mistakes.
Comment on lines +745 to +751
# --max-turns 200: initial cap of 30 was sized against
# silent-release baselines (4-6 turns) and was too tight
# for real content reviews. v0.24.0 (PR #788) hit it at
# turn 31 mid-review and failed the run; the editorial
# pass genuinely needs ~30-100 turns to walk a multi-
# file content PR. 200 gives 2x-6x headroom over that
# working range while still clipping a runaway.
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new --max-turns 200 rationale says the editorial pass can legitimately take ~30–100 turns, but an earlier nearby comment still states review is “4–5 turns in every run so far” and refers to “10 / 30” buffers. This creates conflicting documentation in the same step; please update the earlier comment (or remove/replace it) so the timeout/turn-cap guidance is consistent.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants